Add Fortio-Envoy optimization guide#29
Conversation
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
Signed-off-by: Vaibhav Shankar <vaibhav.shankar@intel.com>
|
|
||
| ## Overview | ||
|
|
||
| Evaluates Envoy running as a TCP proxy in front of Fortio, which acts as the backend load generator. The benchmark focuses on proxy-path performance and behavior under load, measuring metrics such as QPS and latency. Both server-side and client-side components are used to generate traffic and collect results, with Envoy and Fortio running in Docker containers based on the images listed below: |
There was a problem hiding this comment.
As a first time reader, topology is unclear. Is it possible to provide a diagram to explain setup?
It is not clear what runs as the server vs client.
Is Fortio being used as both load generator as well as server?
Is client on a different host?
|
|
||
| ## Overview | ||
|
|
||
| Evaluates Envoy running as a TCP proxy in front of Fortio, which acts as the backend load generator. The benchmark focuses on proxy-path performance and behavior under load, measuring metrics such as QPS and latency. Both server-side and client-side components are used to generate traffic and collect results, with Envoy and Fortio running in Docker containers based on the images listed below: |
There was a problem hiding this comment.
Consider starting with an overview followed by paragraphs explaining the two components, and then topology/setup used.
Overview
This tuning guide describes best known practises to optimize performance....... when you run Fortio Envoy...
Fortio
Envoy
Topology/setup
|
|
||
| ## CPU Utilization and CPU Quota | ||
|
|
||
| The script applies Docker CPU quotas (`--cpus 16` for Fortio, `--cpus 8` for Envoy). On a high core-count server (eg., 128 cores/256Threads), Docker enforces these quotas via cgroup CPU BW control. The OS spreads threads across all cores but throttles aggregate CPU time, resulting in roughly **6 - 7% per-core utilization** across all server cores - not saturation. The CPU quota is the binding constraint, not the WL. |
There was a problem hiding this comment.
Which script? You can provide an example script similar to this - https://github.com/intel/optimization-zone/blob/main/software/kafka/README.md#example-system-startup-script
| - **Event-driven, non-blocking I/O**: Each worker thread runs an independent libevent loop. | ||
| - **`--concurrency N`**: Spawns N worker threads. Each thread owns its own listener socket and connection pool, so there is near-zero cross-thread coordination for established connections. | ||
| - **TCP proxy mode** (used here): Envoy accepts a TCP connection on port 9090, opens a connection to Fortio on 8080, and shuttles bytes between them. No L7 parsing overhead. | ||
| - **In mesh mode (`SECURE_MESH=true`)**: Adds mTLS - Envoy terminates the downstream TLS connection and re-originates a new TLS connection upstream, roughly doubling the cryptographic work per connection. |
There was a problem hiding this comment.
Earlier in the document you mentioned SECURE_MESH=true as “no Envoy sidecars, raw application performance” (direct mode), but here you have SECURE_MESH=true Envoy terminates the TLS connection?
|
|
||
| 2. **Cache coherency traffic**: Spin locks and atomic CAS operations on shared scheduler state cause cache line bouncing across all sockets. On a multi-socket NUMA system, cross-socket coherency traffic adds latency to every lock acquisition and scales with core count. | ||
|
|
||
| 3. **Kernel paths involved** (from perf flame graphs): |
There was a problem hiding this comment.
Is it possible to upload a flamegraph for reference?
|
|
||
| ### 1. NUMA Pinning (Most Impactful) | ||
|
|
||
| Pin both Fortio and Envoy to a single NUMA node. This is the single most impactful optimization - it substantially reduces `native_queued_spin_lock_slowpath` overhead by keeping all memory allocations, thread migrations, and NIC interrupts on the same socket. |
There was a problem hiding this comment.
Pin both Fortio and Envoy to a single NUMA node. We mean on the server host?
Add Fortio-Envoy optimization guide and related documentation.